2  Logistic Regression

2.1 The General Linear Model

Y = b_0 + b_1 * x

What the formula above establishes is a relationship between our response variable (y) our explanatory value (x).

What it says exactly is that the value of Y is dependent on the value of x, where b_0 determines where the function starts at x=0, and b_1 describes how the function will change over time as we iterate values of x (the slope).

This should remind most people to the formula from grade school:

y = mx+b.

Yes, it actually has value in the real world!

This is the simplest version of this model, describing a linear relationship of a single explanatory variable (a single x).

What we want to do is build up to a formula that has multiple explanatory variables, as for CCFD we will not simply be looking at a single variable, but several that all have some relationship to determine whether or not the transaction can be predicted as fraudulent.

This formula is incredibly valuable for statistical modeling for several reasons:

  • Regression helps us estimate parameters without bias.

  • Measurement error can take place at every step of the process.

  • Regression helps us control uncertainty.

Because the model is linearized, meaning, the thing we are trying to predict (response) can be described by summing up the explanatory variables (parameters) at each step- we can analyze how a single variable (x_n) effects our prediction without changing values of other x’s in the series.

This allows us

2.2 Basics of Logistic Regression Modeling

bbb